Distance Measures and Smoothing Methodology for Imputing Features of Documents

نویسندگان

  • Andrey FEUERVERGER
  • Peter HALL
  • Gelila TILAHUN
  • Michael GERVERS
چکیده

We suggest a new class of metrics for measuring distances between documents, generalizing the well-known resemblance distance. We then show how to combine distance measures with statistical smoothing to develop techniques for imputing missing features of documents. We treat in detail the case where these features are continuous variates, but we note that our methods can be adapted to settings where the features are ordered or unordered categorical variates (e.g., the names of potential authors of the documents). The results of applying our ideas to the dating of medieval manuscripts are briefly summarized.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using statistical smoothing to date medieval manuscripts∗

We discuss the use of multivariate kernel smoothing methods to date manuscripts dating from the 11th to the 15th centuries, in the English county of Essex. The dataset consists of some 3300 dated and 5000 undated manuscripts, and the former are used as a training sample for imputing dates for the latter. It is assumed that two manuscripts that are “close”, in a sense that may be defined by a ve...

متن کامل

Valuing Indirect Citations in Citation Networks using Data Fusion

Any scientific activity requires awareness of previous related activities. Citation networks are the networks in which each document is compared as a link of a chain with its previous and next documents, and the documents with the highest number of citations are considered as the most effective ones in a domain. Most of the introduced methods use direct citations for valuing the documents. One ...

متن کامل

بررسی قابلیت بهکارگیری سنجه های مرکزیت به عنوان شاخصهای ارتباط استنادی مدارک در بازیابی اطلاعات رابطه ای: مطالعۀ مقدماتی

Purpose: this is a pilot study tends to investigate correlation between centrality measures with bibliographic coupling as a well-known citation-based document similarity measure.  Methodology: using citation analysis method, 40 research articles belonging to four engineering/pure disciplines (Physics, Chemistry, Biology, and computer) and four Humanities and Social disciplines (Economics, Edu...

متن کامل

Protection of Archival Documents from Photochemical Eects

Purpose: ­The purpose of this paper is to highlight the destructive effects of light on archival documents/paper materials. ­The research aims to explain the mechanism of photochemical degradation and the damaging effect of light on paper. It also tells us about the measures to be adopted to control the deteriorating effects of light on paper step by step. Design/Methodology/Approach: Th­e res...

متن کامل

Removing car shadows in video images using entropy and Euclidean distance features

Detecting car motion in video frames is one of the key subjects in computer vision society. In recent years, different approaches have been proposed to address this issue. One of the main challenges of developed image processing systems for car detection is their shadows. Car shadows change the appearance of them in a way that they might seem stitched to other neighboring cars. This study aims ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005